The detect color process examines an image, looking for significant amounts of saturated color data. Thresholds may be set for saturation, object size, and object count required for a positive result, and the algorithm automatically ignores color fringing and compression artifacts around the edges of dark objects. An additional option allows the dominant image color to be ignored, allowing monochromatic documents printed on colored paper to be considered 'not color'. This is done in a manner similar to Virtual Bulb, with all pixels of similar hue as the dominant color being disregarded.
Since determination of what constitutes a 'color' document will vary, the algorithm uses a number of parameters to define 'color', as well as a way to ignore monochromatic backgrounds, such as colored paper.
Detect color works in the HSV colorspace (hue, saturation, volume), and the threshold for 'color' is set using a minimum saturation and a minimum brightness (or volume), each expressed as a value in the range of 0 to 255. Pixels darker or less saturated than the minimums are disregarded. Detection of colored paper, for example, will require a high minimum brightness and a moderate to low saturation; detection of highlighter will generally use a high saturation and brightness; and detection of colored pen inks will require a low saturation and brightness. The IgnorePaperColor option will find the dominant hue of the image, and ignore all pixels that fall near that hue.
Once pixels not meeting the requirements for 'color' are found, they are grouped into objects, and checked for those that meet the minimum size. These objects are weighted by size (for example, an object twice as large as the minimum will count as two objects) and compared to the sensitivity threshold to determine if the image has sufficient color data to warrant an 'is color' determination.
In addition to determining if an image has significant color content, the detect color process can also return the locations of the colored objects on the page. The locations are returned as a set of bounding boxes, containing the X, Y, width and height in pixels of each object. Overlapping boxes can be combined if desired. Bounding boxes are only returned if the image is determined to be color, and if the overall saturation of the image is less than the threshold.
The Reason field is used to provide feedback on why an image was determined 'color' or 'not color'. One or more of the following fields will be set when the image is processed, and each can be check by logically ANDing the Reason field with the enumerated value:
- SCANFIX_DETECT_COLOR_UNSATURATED indicates that the overall saturation of the image is less than the minimum required saturation. If the image was expected to return "is color" due to overall saturation, then the minimum saturation should be lowered. An overall saturation value near to the minimum saturation will reduce the confidence of a 'not color' result.
- SCANFIX_DETECT_COLOR_NO_ITEMS indicates that no significant saturated objects were found in the image. If the image should have registered as "color", then this indicates that the minimum saturation and/or brightness need to be lowered.
- SCANFIX_DETECT_COLOR_FEW_ITEMS indicates that large, saturated objects were found in the image, but not a number sufficient to trigger an 'is color' response. A low confidence will indicate a number close to the threshold. Increasing the sensitivity will result in an 'is color' result.
- SCANFIX_DETECT_COLOR_SMALL_ITEMS indicates that objects of more than 50% of the minimum size, but less than the minimum size were found. A low confidence will indicate a significant number of small objects. Lowering the minimum size will convert these objects into large objects, and generate an 'is color' result.
- SCANFIX_DETECT_COLOR_SATURATED indicates that the overall saturation of the image was above the minimum required saturation. A low confidence in an 'is color' result based on average saturation indicates a saturation near the minimum.
- SCANFIX_DETECT_COLOR_FOUND_ITEMS indicates that the image had a sufficient number of large, saturated color objects to trigger an 'is color' response. A low confidence value means that the weighted number of objects was near the threshold. Lowering the sensitivity will raise the confidence.
- SCANFIX_DETECT_COLOR_IGNORED indicates that the image was more than 50% of the minimum required saturation, and the dominant hue was ignored.
The confidence is generated by combining the results of the image saturation and weighted large and small object counts. Low confidences mean that one or more of the determining factors in the decision was close to the threshold, while a high confidence value means the values were far from the threshold. The Reason flags are intended to provide a way to define what groups of colored pixels are significant, through the changing of the minimum saturation, brightness, and size, as well as how much significant data is required for an 'is color' determination. The average saturation, big object count, and small object count provide information on what the algorithm found in the image.
- Subcode is set to SF_SUBCODE_DETECT_COLOR (33).
- u.SC33.IgnorePaperColor is set true to indicate that a colored background should be considered 'not color'.
- u.SC33.MinimumSaturation is a value in the range of 0 to 255 indicating the minimum saturation required for a pixel to be considered 'color'.
- u.SC33.MinimumBrightness is a value in the range of 0 to 255 indicating the minimum brightness ('V' in the HSV colorspace) required for a pixel to be considered 'color'.
- u.SC33.MinimumSize is the minimum size, length and width, required for a saturated object to be considered colored data.
- u.SC33.Sensitivity is a value from 0 to 100 indicating how many saturated objects are required for a high confidence 'is color' result.
- u.SC33.GenerateBoxes is set true to generate bounding boxes for all saturated objects over the minimum size.
- u.SC33.BoxOverlap is the percentage of area (of the smaller box) that must intersect before bounding boxes are merged.
- u.SC33.Reason is a bitfield that is set with various flags, indicating which processing steps were used and why an image was or was not determined color.
- u.SC33.IsColor returns true if the image is determined to be 'color'.
- u.SC33.BoxCount is the number of bounding boxes returned.
- u.SC33.BoxCoords is a pointer to a list of values representing the bounding boxes; organized as (x, y, width, height, ...). This list is allocated by the opcode during REQ_EXEC and freed during REQ_TERM.
- u.SC33.BigObjects returns the weighted count of colored objects over the minimum size found in the image.
- u.SC33.SmallObjects returns the weighted count of objects near, but less than, the minimum size found in the object.
- u.SC33.AverageSaturation returns the saturation of the average image color.